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PRODUCTION OF PEPTIDES AND PROTEINS BY ACCUMULATION IN PLANT 
ENDOPLASMIC RETICULUM-DERIVED PROTEIN BODIES 

FIELD OF THE INVENTION 

5 The invention refers to the production of peptides and proteins of interest in host 

system plants by accumulation thereof in endoplasmic reticulum-derived protein bodies, to 
nucleic acid encoding such products and to the use of said nucleic acids in the manufacture of 
constructs and vectors for transforming host plant systems. A method for expressing and 
isolating heterologous products of interest, such as calcitonin (CT), in plants, is specifically 
10 disclosed. 

BACKGROUND OF THE INVENTION 

As the demand for biopharmaceuticals is expected to increase considerably because of 

15 the remarkable advances in genome knowledge and in related biomedical research, there is a 
considerable interest in elaborating low cost recombinant production systems. 

Genetic engineering of plants to produce biopharmaceuticals is relatively recent since 
other transgenic systems including bacteria, fungi and cultured mammalian cells, have been 

20 largely and for a long time adopted for bioproduction. Nevertheless, some recombinant 
therapeutic proteins using plant expression system are already on the market or in various 
stages of human clinical trials like hirudin, an anticoagulant protein to treat thrombosis 
(Parmenter et al, 1995), a chimeric IgG-IgA vaccine against dental caries (Ma et al, 1998), a 
bacterial vaccinogen against an enterotoxigenic strain of E. coli (Haq et al., 1995), and a 

25 recombinant dog gastric lipase to treat cystic fibrosis (Benicourt et al., 1993). 

Plant expression systems are attractive because expression level of recombinant 
proteins can be enhanced by exploiting the innate sorting and targeting mechanisms that 
plants use to target host proteins to organells. Moreover plant-derived biopharmaceuticals can 
30 be easily scale up for mass production and have the advantage to minimize health risks arising 
from contamination with pathogens or toxins. 
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Plants, more and more, appear to be an attractive expression system because of their 
potential to provide unlimited quantities of biologically active material at low production 
costs and with reduced health risks. The capacity of plants to accumulate high levels of 
recombinant proteins and to perform most of post translational modifications make them 
5 considered as bioreactors for molecular farming of recombinant therapeutics (for review see 
Fischer and Emans, 2000). However, important decisions concerning crop species selection, 
tissues choice, expression and recovery strategies and posttranslational processing are 
determinant for the feasibility of plant-based production toward commercialization (Cramer et 
al., 1999). 

10 

Subcellular targeting of recombinant proteins is an important consideration for high 
level accumulation and correct assembly and folding of such proteins in plants. 
Compartimentalization of host proteins into intracellular storage organelles is generally 
achieved using appropriate signal peptides or whole protein fusions. A variety of recombinant 
15 therapeutic proteins have been adressed to the following compartiments of plants, apoplastic 
space (McCormick et al., 1999), chloroplasts (Staub et al., 2000), endoplasmic reticulum (ER) 
(Stoger et al., 2000). Immunoglobulins directed to the ER compartment in transgenic plants 
have been showed to give 10-100 fold higher yields than when adressed to others 
compartments such as the apoplasm or the cytosol (Conrad and Fiedler, 1998). 

20 

The targeting of complex proteins such as antibodies in the ER compartiment is 
particularly interesting because most of the post-translational modifications required to obtain 
a functional product take place inside the ER (During et al., 1990; Ma and Hein, 1995; 
Coiuad and Fiedler, 1998). Indeed, within the ER, the signal peptide is cleaved and stress 

25 proteins such as the binding IgG protein (BiP) and enzymes such as protein disulphide 
isomerase (PDI), function as chaperones, bind to the unassembled protein and direct 
subsequent folding and assembly. In addition to these particular characteristics, available 
evidence indicates that plant ER is highly flexible making it an ideal reservoir for 
heterologous pharmaceutical proteins. The ER, even if it appears to be the gateway to the 

30 secretory pathway, is also able to store proteins for short or long periods of time. Plants store 
amino acids for long periods in form of specific storage proteins. One mechanism to protect 
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these storage proteins against uncontrolled premature degradation is to deposit them into ER- 
derived storage organelles called protein bodies (PB) (for review, Miintz, 1998). The 
assembly of such organelles as the simple accumulation of recombinant proteins into the ER 
himen requires as a first step the retention of the host protein. Secretory proteins when 
5 correctly folded and assembled into the ER have a variety of cellular destinations mostly by 
progression via the Golgi apparatus. However, ER retention of soluble transport-competent 
proteins can be induced by the carboxy-terminal retention/retrieval signal KDEL (or HDEL) 
(Munro et aL, 1987; Wandelt et al., 1992; Vitale et al., 1993). This conserved C-terminal 
motif, recognized in the Golgi apparatus through transmembrane receptors, permit the 

10 recycling of escaped ER resident proteins back to the ER (Vitale and Denecke, 1999; 
Yamamoto et aL, 2001). Many recombinant antibody fragments have been extended with the 
KDEL signal in order to be stably accumulated in plants ER (Verch et al., 1998; Torres et aL, 
1999). An alternate way to generate retention and accumulation of recombinant proteins into 
the ER compartment is to create an appropriate fusion with a natural ER resident such as a 

1 5 seed storage protein. 

WO 01/75312 discloses a method for producing a cytokine in a plant host system 
wherein said plant host system has been transformed with a chimeric nucleic acid sequence 
encoding said cytokine, said chimeric nucleic acid sequence comprising a first nucleic acid 
20 sequence capable of regulating the transcription in said plant host system of a second nucleic 
acid sequence wherein said second nucleic acid sequence encodes a signal sequence that is 
linked in reading frame to a third nucleic acid sequence encoding a cytokine and a fourth 
nucleic acid sequence linked in reading frame to the 3 ' end of said third nucleic acid sequence 
encoding a "KDEL" amino acid sequence, 

25 

Zeins are a group of proteins that are synthesized during endosperm development in 
com and may be separated in four groups a, p, y and 6, based on their solubility. Zeins can 
aggregate into PB directly in the ER. Plants or plant tissues comprising rumin stable protein 
bodies expressed as fusion proteins comprising a full-length zein protein and an operably 
30 linked proteinaceous material have been disclosed (WO 00/40738). 
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y-Zein, a maize storage protein, is one of the four maize prolamins and represents 10- 
15% of the total protein in the maize endosperm. As other cereal prolamins, a and y-zeins are 
biosynthesized in membrane-bound polysomes at the cytoplasmic side of the rough ER, 
assembled within the lumen and then sequestrated into ER-derived PB (Herman and Larkins, 
5 1999, Ludevid et al., 1984, Torrent et al., 1986). y-Zein is composed of four characteristic 
domains i) a peptide signal of 19 amino acids, ii) the repeat domain containing eight units of 
the hexapeptide PPPVHL (53 aa), iii) the proX domain where prolines residues alternate with 
others amino acids (29 aa) and iv) the hydrophobic cysteine rich C-terminal domain (11 laa). 
The ability of y-zein to assemble in ER-derived PBs is not restricted to seeds. In fact, when y- 

10 zein gene was constitutively expressed in transgenic Arabidopsis plants, the storage protein 
accumulated within ER-derived PBs in leaf mesophyl cells (Geli et al, 1994). Looking for a 
signal responsable for the y-zein deposition into the ER-derived PB (prolamins do not have 
KDEL signal), it has been demonstrated that the proline-rich N-terminal domain including the 
tandem repeat domain was necessary for ER retention and that the C-terminal domain was 

15 involved in PB formation. However, the mechanisms by which these domains promote the PB 
assembly are still unknown. 

Calcitonin (CT), a 32 amino acids hormonal peptide is essential for correct calcium 
metabolism and has found widespread clinical use in the treatment of osteoporosis, 

20 hypercalcemic shock and Paget's disease (Reginster et al., 1993; Azria et al., 1995; Silverman 
et al., 1997), Human CT is synthetized as a preproprotein with a signal peptide of 25 amino 
acids and two propeptides at the N- and C-terminus (57 aa and 21 aa respectively). The 
resultant active peptide is 32 amino acids long with a single disulphide bridge (Cysl-Cys7) 
and is amidated at the carboxy terminus. In vitro, human CT aggregates which limits its 

25 usefulness as a therapeutic. Consequently, salmon CT which is less prone to aggregate is 
commonly used instead (Cudd et al., 1995). Production of CT is currently achieved by 
chemical synthesis but the cost of this production encouraged some research groups to 
explore alternative approaches. Human and salmon CT have been produced in E. coli (Ray et 
al., 1993; Hong et al., 2000), in mouse pituitary cells (Merli et al., 1996), in nonendocrine cell 

30 lines Cos-7 and CHO (Takahashi et al., 1997) and more recently in the milk of transgenic 
rabbits (McKee et al., 1998). Production of bioactive calcitonin by biotechnological methods 
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requires at least two processing steps: i) generation of a glycine-extended calcitonin 
(Bradbury et al., 1988) and ii) formation of a carboxy-terminal prolinamide via the action of 
the amidation enzyme, peptidyl glycine a-amidating monooxygenase (PAM) (Eipper et al., 
1992). Since it is not currently known whether the carboxyl-amidation occurs in plant cells, in 
5 vitro amidation of plant glycine-extended calcitonin with the PAM enzyme would provide the 
C-terminal amide (Ray et al., 1993). 

SUMMARY OF THE INVENTION 

The problem to be solved by the present invention is to provide an alternate system for 
10 producing peptides and proteins of interest in a plant host system. 

The solution presented herein is based on the ability of proline rich domains of y-zein 
to self-assemble and to confer stability to fusion proteins in the ER of a host plant system. The 
use of a y-zein fusion protein based-system to accumulate a product of interest in a host 
15 system plant constitutes a successful approach to accumulate said product of interest within 
ER-derived PBs of plants. 

The invention is illustrated in the Example, wherein a fusion protein based-system to 
accumulate recombinant CT in ER-derived PBs in tobacco plants is described. Va:rious 
20 proline rich domains were engineered from y-zein to serve as fusion partners through a 
cleavable protease site. Mature calcitonin coding region was fused at the C-tenninus of the y- 
zein domains and expressed in transgenic tobacco plants. The fusion proteins were 
accumulated in ER-derived PBs in tobacco leaves. After purification, the fusion proteins were 
submitted to enterokinase cleavage permitting the release of calcitonin. 

25 

Accordingly, an aspect of this invention relates to a nucleic acid sequence comprising 
(i) a nucleotide sequence encoding the protein y-zein, or a fragment thereof containing a 
nucleotide sequence that encodes an amino acid sequence capable of directing and retaining a 
protein towards the endoplasmic reticulum; (ii) a nucleotide sequence encoding an amino acid 
30 sequence that is specifically cleavable by enzymatic or chemical means; and (iii) a nucleotide 
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sequence encoding a product of interest; wherein said nucleotide sequences are operatively 
linked among them. 

In another aspect, this invention relates to a nucleic acid construct comprising said 
5 nucleic acid sequence. 

In a further aspect, the invention relates to a vector containing said sequence or 
construct and to a cell transformed with said vector. 

10 Tn a further aspect, the invention relates to a transformed plant host system having said 

nucleic acid sequence, construct or vector. 

In a further aspect, the invention relates to a transgenic plant host system comprising, 
integrated in its genome, said nucleic acid sequence. 

15 

In a further aspect, the invention relates to a method for producing a product of interest 
in a plant host system. 

In a further aspect, the invention relates to a method for producing calcitonin in a plant 
20 host system. 

In a further aspect, the invention relates to a fusion protein, said fusion protein having 
an amino acid sequence corresponding to the above mentioned nucleic acid sequence. 

25 BRIEF DESCRIPTION OF DRAWINGS 

Figure 1 shows the nucleotide sequences and translations of y-zein (Figure 1 .A) and y- 
zein derivatives RX3 (Figure LB, upper), R3 (Figure l.B, lower), P4 (Figure l.C, upper) and 
XI 0 (Figure l.C, lower). 

30 Figure 2 shows the nucleotide sequence (lane 2) and translation (lane 1) of synthetic 

calcitonin (CT). The synthethic CT gene was constructed using preferential plant codon 
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usage. Codon modifications are underlined in comparation to the wild type salmon CT gene 
(lane 3). The synthetic gene contains at 5* end a linker sequence corresponding to the 
enterokinase cleavage site (EK) and is extended at 3* to produce a single C-terminal glycine. 

Figure 3 shows a schematic outline for the construction of pCRX3CT plasmid. The 
process represented was the same for the obtention of the following plasmids pCZeinCT, 
pCR3CT pCP4CT and pCXlOCT, the difference among them being the corresponding y-zein 
or y-zein derived sequences introduced. The different plasmids are not depicted in proportion. 

Figure 4 shows a schematic representation of plasmids pBZeinCT, pBRX3CT, 
pBR3CT, pBP4CT and pBXlOCT. The different plasmids are not depicted in proportion. 

Figure 5 shows a schematic representation of the different fusions proteins. y-Zein and 
y-zein derived domains (RX3, R3, P4 and XI 0) were fused to calcitonin (CT) through the 
enterokinase cleavable site (EK). SP, signal peptide; REPEAT, repeat domain (PPPVHL) 
eight units; Rl, one repeat unit; Pro-X, proline-Xaa; PX, fragment of Pro-X domain; C-term, 
cysteine rich C-terminal domain; N, N-terminal sequence of the mature protein. Amino acids 
number for each fusion protein is indicated at rigth. 

Figure 6 shows the results of an immunoblot analysis of the fusion proteins in 
transgenic tobacco plants using y-zein antiserum. Soluble proteins were extracted from wild 
type (WT) and transgenic tobacco (To) leaves, separated on 15% SDS-polyacrylamide gels 
(20 |Lig per lane) and transferred to nitrocellulose. Numbers represent the independant 
transgenic lines obtained for the different chimeric genes, y-zein-CT, RX3-CT, R3-CT, P4- 
CT. 

Figure 7 shows: A. Comparative western blot analysis of the different recombinant 
fusion proteins using CT antiserum. Soluble protein extracts were prepared from wild type 
plants (WT) and transgenic tobacco lines (Tl) having the maximum fusion protein expression 
of the related chimeric gene, 8 iiig of soluble proteins were loaded on 15% SDS- 
polyacrylamide gel and transferred to nitrocellose. B. Comparative northem blot analysis of 
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the different chimeric gene transcripts. Total RNAs were isolated from the transgenic lines 
analyzed by immunoblot (Figure 7A), fractionated on denaturing formamide gel 
electrophoresis (30 ng per lane) and capillary blotted onto nylon membrane. Blots were 
hybridized with a random primed probe (129 bases) obtained from calcitonin cDNA. 

5 

Figure 8 shows the subcellular localization of RX3-CT and P4-CT proteins in 
transgenic tobaco plants: (A) Immunolocalization of RX3-CT protein in RX3-CT transgenic 
lines using CT antiserum (dihition 1:100). (B) Immunolocalization of P4-CT protein in P4-CT 
transgenic lines using CT antiserum (dilution 1:100). (C) Immunolocalization of RX3-CT 

10 protein in RX3-CT transgenic Hues using y-zein antiserum (dilution 1:1.500). (D) 
Immunolocalization of BiP protein in RX3-CT transgenic lines using BiP antiserum (dilution 
1:250). (E) Immunolocalization in wild type plants using y-zein antiserum (dilution 1:1.500). 
serum. (F) Immunolocalization in RX3-CT transgenic plants without primary antibody 
(dilution 1:1.500). Immunocytochemistry on tobacco leaf sections was performed by using 

15 the primary antibodies indicated and protein A-coUoidal gold (15 nm).cw: cell wall; ch: 
chloroplast; pb: protein body; v: vacuole. 

Figure 9 shows the results of the immunoblot analysis of RX3-CT and P4-CT fusion 
protein EK cleavage. 12 |_ig of each partially purified fusion protein were incubated with 0.2 
20 U EK during 24 hours at 20°C. Digested fusion proteins were fractionated on 18% Tris- 
Tricine polyacrylamide gel electrophoresis and transfen*ed to nitrocellulose. Lanes 1, non 
digested fusion proteins (1 iiig); lanes 2, digestion products; lanes 3, synthetic salmon CT 
standard. 

25 Figure 10 shows the results of RP-HPLC fractionation of RX3-CT ftision protein 

digested by EK. pCT released from RX3-CT fusion protein was detected in fraction 3 (Tr = 
13 min) by TOF-MALDI using synthetic salmon CT as standard. 

Figure 1 1 shows the results of TOF-MALDI mass spectrometry characterization of 
30 (A) synthetic salmon CT (MW = 3433.24) and (B) plant CT (MW = 3491.93) eluted at Tr = 
13 min from the RP-HPLC fractionation. 
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DETAILED DESCRIPTION OF THE INVENTION 

The first aspect of the invention provides a nucleic acid sequence, hereinafter referred to 
as the nucleic acid sequence of the invention, comprising: 

5 

a first nucleic acid sequence containing the nucleotide sequence that encodes the 
protein y-zein, or a fragment thereof containing a nucleotide sequence that encodes an amino 
acid sequence capable of directing and retaining a protein towards the endoplasmic reticulum 
(ER); 

10 

a second nucleic acid sequence containing a nucleotide sequence that encodes an 
amino acid sequence that is specifically cleavable by enzymatic or chemical means; and 

a third nucleic acid sequence containing the nucleotide sequence that encodes a 
15 product of interest; 

wherein the 3' end of said first nucleic acid sequence is linked to the 5' end of said 
second nucleic acid sequence and the 3' end of said second nucleic acid sequence is linked to 
the 5' end of said third nucleic acid sequence. 

20 

The first nucleic acid sequence contains the nucleotide sequence that encodes the 
protein y-zein, or a fragment thereof containing a nucleotide sequence that encodes an amino 
acid sequence capable of directing and retaining a protein towards the ER. 

25 The term "y-zein" as used herein refers to a maize storage protein which is composed 

of the four characteristic domains mentioned previously in the Background of the Invention. 
Said term includes native y-zein proteins, as well as variants thereof and recombinant y-zein 
proteins which are capable of directing and retaining a protein towards the ER. 
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Practically any nucleotide sequence, encoding a y-zein protein, or a fragment thereof 
containing a nucleotide sequence that encodes an amino acid sequence capable of directing 
and retaining a protein towards the ER, may be used. 

Accordingly, in a preferred embodiment, the first nucleic acid sequence contains the 
nucleotide sequence encoding the full-length y-zein protein. In a particular embodiment, the 
nucleotide sequence encoding a full-length y-zein protein is shown in Figure lA and 
identified in SEQ ID NO: 1, 

In another preferred embodiment, the first nucleic acid sequence contains a nucleotide 
sequence encoding a fragment of y-zein protein, said fragment containing a nucleotide 
sequence that encodes an amino acid sequence capable of directing and retaining a protein 
towards the ER. In this case, the first nucleic acid sequence may contain: 

one or more nucleotide sequences encoding all or part of the repetition domain of 
the protein y-zein; 

one or more nucleotide sequences encoding all or part of the ProX domain of the 
protein y-zein; or 

one or more nucleotide sequence encoding all or part of the repetition domain of 
the protein y-zein, and one or more nucleotide sequence encoding all or part of the 
ProX domain of the protein y-zein. 

In a particular embodiment, said first nucleic acid sequence contains a nucleotide 
sequence encoding a fragment of y-zein protein, said fragment containing a nucleotide 
sequence that encodes an amino acid sequence capable of directing and retaining a protein 
towards the ER, selected from the group consisting of: 
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the nucleotide sequence shown in SEQ ID NO: 2 [nucleotide sequence identified 
asRX3 (Figure IB)], 

the nucleotide sequence shown in SEQ ID NO: 3 [nucleotide sequence identified 
5 as R3 (Figure IB)], 

the nucleotide sequence shown in SEQ ID NO: 4 [nucleotide sequence identified 
asP4 (Figure IC)], and 

10 - the nucleotide sequence shown in SEQ ID NO: 5 [nucleotide sequence identified 

as XI 0 (Figure IC)]. 

The second nucleic acid sequence contains a nucleotide sequence that encodes an 
amino acid sequence that is specifically cleavable by enzymatic or chemical means. In a 
1 5 particular embodiment, said second nucleic acid sequence comprises a nucleotide sequence 
that encodes a protease cleavage site, for example, an amino acid cleavable site by a protease 
such as an enterokinase, Arg-C endoprotease, Glu-C endoprotease, Lys-C endoprotease, 
factor Xa and the like. 

20 Altematively, the second nucleic acid sequence comprises a nucleotide sequence that 

encodes an amino acid that is specifically cleavable by a chemical reagent, such as, for 
example, cyanogen bromide which cleaves methionine residues, or any other suitable 

chemical reagent. 

25 The second nucleic acid sequence may be generated as a result of the union between 

said first nucleic acid sequence and said third nucleic acid sequence. In that case, each 
sequence contains a number of nucleotides in such a way that when said first and third nucleic 
acid sequences become linked a functional nucleotide sequence that encodes an amino acid 
sequence that is specifically cleavable by enzymatic or chemical means, i.e., the second 

30 nucleic acid sequence, is formed. In an alternate embodiment, the second nucleic acid 
sequence is a foreign sequence operatively inserted between said first and third nucleic acid 
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sequence. 

The third nucleic acid sequence contains the nucleotide sequence that encodes a 
product of interest. In principle, any product of interest may be expressed by the system 
5 provided by the instant invention. In a preferred embodiment, the product of interest is a 
proteinaceous (i.e., a protein or peptide) drug, for example, a peptide hormone, such as 
calcitonin, erytliropoietin, thrombopoietin, growth hormone and the like, an interferon, i.e., a 
protein produced in response to viral infections and as cytokine during an immune response, 
etc. Preferably, said therapeutic products of interest are effective for treating the human or 
10 animal body. 

In a particular embodiment, the third nucleic acid sequence comprises a nucleotide 
sequence encoding calcitonin (CT), for example, human calcitonin (hCT) or salmon 
calcitonin (sCT). In general, in this case, said third nucleic acid sequence, preferably, includes 
15 a codon for glycine at the 3' end of said nucleic acid sequence encoding calcitonin thus 
rendering a glycine-extended calcitonin. 

According to the invention, the 3' end of said first nucleic acid sequence is linked to 
the 5' end of said second nucleic acid sequence and the 3' end of said second nucleic acid 
20 sequence is linked to the 5' end of said third nucleic acid sequence, i.e., said first, second and 
third nucleic acid sequences are in reading firame. 

The nucleic acid sequence of the invention may be obtained by using conventional 
techniques known for the skilled person in the art. In general, said techniques involve linking 

25 different fragments of the nucleic acid sequence of the invention in a suitable vector. A 
review of said conventional techniques may be found, for example, in "Molecular cloning, a 
Laboratory Manual", 2"^ ed., by Sambrook et al.. Cold Spring Harbor Laboratory Press, 1989. 
The construction of some vectors containing a nucleic acid of the invention is disclosed in the 
Example and illustrated in Figures 3 and 4. As shown therein, various proline rich domains 

30 were engineered from y-zein to serve as fusion partners through a cleavable protease site. 
Mature calcitonin coding region (32 aa) was fiised at the C-terminus of the y-zein domains 
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and expressed in transgenic tobacco plants. The fusion proteins were accumulated in ER- 
derived protein bodies in tobacco leaves. After purification, the fusion proteins were 
submitted to enterokinase cleavage permitting the release of calcitonin which may be further 
purified from digestion mixture by a reverse phase chromatography. 

5 

In another aspect, the invention provides a fusion protein, hereinafter referred to as the 
fusion protein of the invention, comprising (i) the amino acid sequence of the protein y-zein, 

or a fragment thereof capable of directing and retaining a protein towards the ER, (ii) an 
amino acid sequence that is specifically cleavable by enzymatic or chemical means, and (iii) a 
10 product of interest; said fusion protein being the expression product of the nucleic acid 
sequence of the invention in a host plant system. 

The fusion protein of the invention is accumulated in stable, ER-derived PBs, in a host 
plant system. The enzymatically or chemically cleavable site, which is present at the C- 
15 terminus of y-zein domains, allows to recover the product of interest afterwards. The product 
of interest may be then isolated and purified by conventional means. Therefore, the fusion 
protein of the invention constitutes a novel and successful approach to accumulate a product 
of interest. 

20 In an embodiment, the fusion protein of the invention comprises a full-length y-zein 

protein. A specific amino acid sequence of full-length y-zein is shown in Figure lA and 
identified in SEQ ID NO: 6. 

In another embodiment, the fusion protein of the invention comprises a fragment of a 
25 y-zein protein, said fragment containing an amino acid sequence capable of directing and 
retaining a protein towards the ER. In a particular embodiment, the fusion protein of the 
invention comprises a fragment of a y-zein protein, selected from the group consisting of: 

the amino acid sequence shown in SEQ ID NO: 7 [amino acid sequence 
30 coiTCsponding to RX3 (Figure IB)], 
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the amino acid sequence shown in SEQ ID NO: 8 [amino acid sequence 
con-esponding to R3 (Figure IB)], 

the amino acid sequence shown in SEQ ID NO: 9 [amino acid sequence 
5 corresponding to P4 (Figure IC)], and 

the amino acid sequence shown in SEQ ID NO: 10 [amino acid sequence 
corresponding to XI 0 (Figure IC)]. 

10 The fusion protein of the invention contains an amino acid sequence that is 

specifically cleavable by enzymatic or chemical means. In a particular embodiment, said 
cleavable site comprises a protease cleavage site, for example, an amino acid cleavable site by 
a protease such as an enterokinase, Arg-C endoprotease, Glu-C endoprotease, Lys-C 
endoprotease, factor Xa and the like, or an amino acid cleavable site by a chemical reagent, 

15 such as, for example, cyanogen bromide which cleaves methionine residues, or any other 
suitable chemical reagent. 

The fusion protein of the invention also contains a product of interest, for example, a 
proteinaceous (i.e., a protein or peptide) drug, such as a peptide hormone, an interferon, and 
20 the Hke. Preferably, said product of interest is effective for treating the human or animal body. 
In a particular embodiment, the fusion protein of the invention comprises a calcitonin (CT), 
for example, an optionally glycine-extended human calcitonin (hCT) or salmon calcitonin 
(sCT). 

25 In a further aspect, the invention provides a nucleic acid construct comprising (i) the 

nucleic acid sequence of the invention, and (ii) a regulatory nucleotide sequence that regulates 
the transcription of the nucleic acid of the invention (i), said regulatory sequence (ii) being 
functional in plants. Said nucleic acid sequences are operatively linked. 



30 Practically any plant functional regulatory sequence may be used. In an embodiment, 

said regulatory sequence (ii) is, preferably, tissue-specific, i.e., it can regulate the transcription 
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of the nucleic acid of the invention in a specific tissue, such as seeds, leaves, tubercles, etc. 

The regulatory sequence (ii) may comprise a promoter ftinctional in plants. Virtually, 
any promoter functional in plant may be used. In a particular embodiment, said regulatory 
5 sequence (ii) comprises the promoter 35SCaMV. In other particular embodiment, said 
regulatory sequence (ii) comprises the "patatina" promoter, a storage protein promoter, the 
libiquitine gene promoter, the regulatory sequences of the y-zein gene, or the like. 

The regulatory sequence (ii) may also contain a transcription termination sequence, 
10 Virtually, any transcription termination sequence functional in plant may be used. In a 
particular embodiment, said transcription termination sequence comprises the terminator 
35SCaMV, the terminator of the octopine synthase (ocs) gene, the terminator of the nopaline 
synthase (nos) gene, the terminator of the y-zein gene, and the like. 

15 The regulatory sequence (ii) may also contain a translation enhancer functional in 

plant. Virtually, any translation enhancer functional in plant may be used, for example, the 
promoting sequence for transcription of the tomato etch virus, and the like. 

The nucleic acid sequence of the invention, or the constmct provided by this 
20 invention, can be inserted into an appropriate vector. Therefore, in a further aspect, the 
invention provides a vector comprising the nucleic acid sequence of the invention or a nucleic 
acid construct provided by the instant invention. Suitable vectors include plasmids, cosmids 
and viral vectors. In an embodiment, said vector is suitable for transforaiing plants. The 
choice of the vector may depend on the host cell wherein it is to be subsequently introduced. 
25 By way of example, the vector wherein the nucleic acid sequence of the invention is 
introduced may be a plasmid, a cosmid or a viral vector that, when introduced into a host cell, 
is integrated into the genome of said host cell and is replicated along with the chromosome (or 
chromosomes) in which it has been integrated. To obtain said vector, conventional methods 
can be used (Sambrook et aL, 1989). 

30 

In a further aspect, the invention provides a plant host system, said plant host system 
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having been transformed with the nucleic acid of the invention, or with a construct or a vector 
provided by the instant invention. 

As used herein, the term "plant host system" includes plants, including, but not limited 
to, monocots, dicots, and, specifically, cereals (e.g., maize, rice, oat, etc.), legumes (e.g., soy, 
etc.), cruciferous (e.g., Arabidopsis thaliana, colza, etc.) or solanaceous (e.g., potato, tomato, 
tobacco, etc.). Plant host system also encompasses plant cells. Plant cells includes suspension 
cultures, embryos, merstematic regions, callus tissue, leaves, roots, shoots, gametophytes, 
sporophytes, pollen, seeds and microspores. Plant host system may be at various stages of 
maturity and may be grown in liquid or solid culture, or in soil or suitable medium in pots, 
greenhouses or fields. Expression in plant host systems may be transient or permanent. Plant 
host system also refers to any clone of such plant, seed, selfed or hybrid progeny, propagule 
whether generated sexually or asexually, and descendents of any of these, such as cuttings or 
seeds. 

The transformation of plant host systems may be carried out by using conventional 
methods. A review of the genetic transfer to plants may be seen in the textbook entitled 
"Ingenieria genetica and transferencia genica", by Marta Izquierdo, Ed. Firamide (1999), in 
particular. Chapter 9, "Transferencia genica a plantas", pages 283-316. 

In a further aspect, the invention provides a transgenic plant host system, engineered 
to contain a novel, laboratory designed transgene, said transgenic plant host system 
comprising, integrated in its genome, the nucleic acid of the invention. Said transgenic plant 
host system may be obtained by means of conventional techniques, for example, through the 
use of conventional antisense mRNA techniques and/or overexpression (in sense silencing) or 
others, for example, by using binary vectors or other vectors available for the different plant 
transfomiation techniques currently in use. Examples of transgenic plant host systems 
provided by the present invention include both monocotyledon and dicotyledonous plants, 
and, specifically, cereals, legumes, cruciferous, solanaceous, etc. 

The nucleic acid sequence of the invention is useful for producing a product of interest 
in a plant host system. Therefore, in a further aspect, the invention provides a method for 
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producing a product of interest in a plant host system, which comprises growing a 
transfomied or transgenic plant host system provided by the instant invention, under 
conditions that allow the production and expression of said product of interest in the form of a 
flision protein. As mentioned above, said fusion protein is accumulated in stable, ER-derived 

5 PBs, in said host plant system. The enzymatically or chemically cleavable site, which is 
present at the C-terminus of y-zein domains, allows to recover the product of interest 
afterwards. The product of interest may be then isolated and purified by conventional means. 
Accordingly, the method provided by the instant invention further comprises, if desired, the 
isolation and purification of said fusion protein, and, optionally, the release of said product of 

10 interest from said fusion protein. The fusion protein is cleaved at the cleavage site by a 
suitable enzyme or chemical reagent, as appropriate. 

hi a further aspect, the invention provides a method for producing calcitonin in a plant 
host system, comprising: 

15 

a) transforming a plant host system with an expression vector or with a nucleic 
acid constmct, comprising a regulatory sequence for the transcription of a 
nucleic acid sequence (nucleic acid sequence of the invention) that consists of: 



20 a first nucleic acid sequence containing the nucleotide sequence that 

encodes the protein y-zein, or a fragment thereof containing a nucleotide 
sequence that encodes an amino acid sequence capable of directing and 
retaining a protein towards the endoplasmic reticulum (ER); 



25 a second nucleic acid sequence containing a nucleotide sequence that 

encodes an amino acid sequence that is specifically cleavable by 
enzymatic or chemical means; and 



30 



a third nucleic acid sequence containing the nucleotide sequence that 
encodes calcitonin; 



wo 2004/003207 



18 



PCT/EP2002/008716 



wherein the 3' end of said first nucleic acid sequence is linked to the 5' end of 
said second nucleic acid sequence and the 3 ' end of said second nucleic acid 
sequence is linked to the 5' end of said third nucleic acid sequence; 

b) generating complete plants from said plant host systems transformed with said 
expression vector or nucleic acid construct; 

c) growing such transformed plants under conditions that allow the production 
and expression of calcitonin in the form of a fusion protein; and, if desired 

d) isolating, purifying said fusion protein and treating said fusion protein in order 
to release calcitonin. 

The invention provides, therefore, a fusion protein based system to accumulate 
recombinant products of interest in ER-derived PBs in plant host systems. The invention is 
flirther illustrated by the following non limitative example. 

EXAMPLE 1 

Production of calcitonin In tobacco plants 

A successful example of CT production in tobacco plants is described below. Various 
proline rich domains were engineered from y-zein to serve as fusion partners through a 
cleavable protease site. Mature CT coding region (32 aa) was fused at the C-tenninus of the y- 
zein domains and expressed in transgenic tobacco plants. A cleavable protease site was 
introduced at the C-terminus of y-zein domains to recover pure calcitonin afterwards. This 
approach provides a high accumulation of fusion proteins within the ER and the fomiation of 
ER-derived PBs in tobacco plants. Fusions proteins were highly accumulated in ER-derived 
PBs in tobacco leaves. The expression level of said fusion proteins reached, in some cases, up 
to 12.44% of total soluble proteins. After only two purification steps, the fusion proteins were 
submitted to enterokinase cleavage permitting the release of calcitonin. Pure calcitonin was 
obtained from digestion mixture by a reverse phase chromatography. Calcitonin product 
accumulated in tobacco plants was validated by mass spectroscopy. Fusion proteins 
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purification, protease digestion and full characterization of the released plant calcitonin (pCT) 
are also presented. 

I. EXPERIMENTAL PROCEDURE 

5 

Construction of chimeric genes and vectors 

The wild type y-zein gene and four y-zein derived sequences named RX3, R3, P4 and 
XI 0 encoding different y-zein domains (Figure lA, IB and IC) were fused with a synthetic 
CT gene containing an enterokinase digestion site (Figure 2) and introduced in plant 
10 transformation vectors as described below and in Figure 3. 

y-Zein, RX3 and R3 cDNA sequences were generated by PGR using pKSG2 (Torrent 
et aL, 1994) as template. XI 0 cDNA was amplified from pDR20, a plasmid produced from 
pKSG2 after deletion of the sequence corresponding to the repeat domain. The primers used 
for the different PCRs were: 
1 5 for y-zein cDNA sequence: 

Tl : 5 'TCATGAGGGTGTTGCTCGTTGCCCTC3 and 
T4: 5'CCATGGCGTGGGGGACACCGCCGGC3', 
for RX3 and XI 0 cDNA sequences: 
Tl and 

20 T2: 5'CCATGGTCTGGCACGGGCTTGGATGCGG 3', and 

for R3 cDNA sequence: 
Tl and 

T3 : 5 ' CC ATGGTCCGGGGCGGTTG AGT AGGGT A3 ' . 
The PGR products were subcloned into a pUG 18 vector (SureGlone Ligation Kit, 
25 Pharmacia) and the resulting plasmids were named pUGZein, pUGRX3, pUGR3 and 
pUGXlO. The vector pUGP4 which contains the y- zein derived sequence P4 (Figure IG) was 
obtained during the screening of pUGRX3 derived clones, y-zein, RX3, R3, P4 and XI 0 
cDNA fragments, containing cohesive ends of BspHI and Ncol, were inserted into the vector 
pGKGFPS65G (Reichel et al., 1996) previously digested with Ncol. This vector was selected 
30 because it contains the regulatory sequences for expression in plants and the GFP coding 
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sequence that would be used for parallel targeting studies of y-zein derived proteins in 
transgenic plants. The vectors generated, pCZeinGFP, pCRXSGFP, pCR3GFP, pCP4GFP and 
pCXlOGFP contained the following regulatory sequences for expression in plant systems: i) 
the enhanced 35S promoter derived from the cauliflower mosaic virus (CaMVp35S), ii) the 
5 traiislational enhancer from tomato etch virus (TL) and iii) the transcription-temiination 
sequence from CaMV35S (pA35S). The y-zein derived/CT chimeric constructs were 
generated by substitution of the GFP coding sequence with the CT synthetic gene as described 
below (see Figure 3). 

The synthetic gene encoding the 32 amino acids of active salmon CT (Figure 2) were 
10 generated from two 122 bases complementary oligonucleotides. The oligonucleotides were 
designed to use preferential plant codons in order to achieved high expression in plants. The 
5' phosphorilated oligonucleotides synthesized using an Applied Biosystems 394 DNA 
synthesizer had the following sequences: 
Call: 

15 5 ' C ATGGACGACGACGACAAGTGCTCCAACCTCTCTACCTGCGTTCTTGGTAAGCT 
CTCTCAGGAGCTTCACAAGCTCCAGACTTACCCTAGAACCAACACTGGTTCCGGT 
ACCCCTGGTTGAT 3% 
Calll: 

5'CTAGATCAACCAGGGGTACCGGAACCAGTGTTGGTTCTAGGGTAAGTCTGGAG 
20 CTTGTGAAGCTCCTGAGAGAGCTTACCAAGAACGCAGGTAGAGAGGTTGGAGCA 
CTTGTCGTCGTCGTC3 ' , 

After purification on 12% polyacrylamide gel, 60 pmole of each oligonucleotide were 
used to fonn the double-strand molecule. Hybridation mixture heated to 95'*C for 5 min was 
maintained at 70*^0 for 1 hour and let get cold at room temperature. The synthetic cDNA 
25 fragment contained Ncol and Xbal cohesive ends at 5' and 3' terminal respectively. The 
synthetic CT cDNA included a 5' linker sequence corresponding to the enterokinase specific 
cleavage site ((Asp)4-Lys) and was extended at 3' end to produce a single glycine for ftirther 
amidation of the CT peptide. The Ncol/Xbal CT cDNA was subcloned into a pUC 18 vector 
and was then inserted into the Ncol and BamHI restriction sites of the vectors pCZeinGFP, 
30 pCRX3GFP, pCR3GFP, pCP4GFP and pCXlOGFP containing the derived y-zein coding 
sequences and deleted from the GFP coding sequence. The resulting constructs were named 
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pCZeinCT, pCRXSCT, pCR3CT, pCP4CT and pCXlOCT (Figure 3). Effective plant 
transformation vectors pBZeinCT, pBRXSCT, pBR3CT, pBP4CT and pBXlOCT (Figure 4) 
were ultimately obtained by inserting the different Hindlll/Hindlll expression cassettes into 
the binary vector pBinl9 (Bevan, 1984). 

5 

Stable tobacco plants transformation 

Binary vectors were transferred into LB A 4404 strains of Agrobacterium tumefaciens. 
Tobacco {Nicotiana tobaccum, W38) leaf discs were transformed according to the method of 
Draper et al. (1988). Regenerated plants were selected on medium containing 200 mg/L 
10 kanamycin and transferred to a greenhouse. Transgenic tobacco plants having the highest 
transgene product levels were cultivated for obtention of Tl generation. Developing leaves 
(approximately 12 cm long) were harvested, immediately frozen with liquid nitrogen and 
stored at -80°C for further experiments. 

1 5 Extraction and western blot analysis of recombinant proteins 

Tobacco leaves were ground in liquid nitrogen and homogenized using 4 ml of 
extraction buffer (50 mM Tris-HCl pH 8, 200 mM dithiothreitol (DTT) and protease 
inhibitors (10 ij.M aprotinin, 1 \xM pepstatin, 100 |iiM leupeptine, 100 |liM 
phenylmethylsulphonyl fluoride and 100 |liM E64 [(N-(N-(L-3-trans-carboxyoxirane-2- 

20 carbonyl)-Lleucyl)-agmantine] per gram of fresh leaf material. The homogenates were stirred 
for 30 min at 4^C and then centrifuged twice (15000 rpm 30 min, 4^C) to remove insoluble 
material. Total soluble proteins were quantified using the Bradford protein assay (Bio-Rad). 
Proteins were separed on 15% SDS polyacrylamide gel and transferred to nitrocellulose 
membranes (0.22 \xM) using a semidry apparatus. Membranes were incubated with y-zein 

25 antiserum (dilution 1/7000) (Ludevid et al., 1985) or an antiserum raised against KLH- 
calcitonin (CT-antiserum) (dilution 1/1000) and were then incubated with horseradish 
peroxidase conjugated antibodies (dilution 1/10000). Immunoreactive bands were detected by 
enhanced chemiluminescence (ECL western blotting system, Amersham). Calcitonin 
antibodies were raised in rabbits by inoculating synthetic salmon calcitonin coupled to KLH. 

30 After four inoculations of the antigen (200 g each), the sera was collected, aliquoted and 
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Stored at -SO^'C. Sera titration were carried out by immuno-dot blots using synthetic calcitonin 
and ELISA assays using BSA-calcitonin as antigen. 

Northern blot analysis 

5 Total RNA was isolated from wild type and transgenic tobacco (Tl) leaves according 

to Logemann et al., 1987. RNA was fractionated on denaturing formamide-agarose gel 
electrophoresis (30 |LLg per lane) and was capillary blotted onto nylon membrane (Hybond N, 
Amersham Pharmacia Biotech). RNA blots were hybridized with a 129 bases DNA probe 
obtained from CT cDNA and labeled with (a-^^P) dCTP using a random primed DNA 
10 labeling kit (Roche). Hybridization was carried out overnight at 42''C and filters were washed 
three times for 15 min in 3X SSC and 0.5% SDS (WA^) at 65*^C. Blots were detected with a 
phosphorlmager scanner (Fluor-S^^ Multilmager, BIO-RAD). 

ELISA assays 

15 ELISA assays were conducted for plant calcitonin (pCT) quantification on soluble leaf 

protein extracts and partially purified y-zein-CT fusion proteins. Microtiter plates (MaxiSorp, 
Nalgene Nunc Intemational) were loaded with soluble proteins (100 |li1) diluted in phosphate- 
buffered saline pH 7.5 (PBS) and incubated overnight at 4"C. After washing the wells three 
times, unspecific binding sites were blocked with 3% bovine serum albumin (BSA) in PBS-T 

20 (PBS containing 0.1% Tween 20), one hour at room temperature. The plates were incubated 
with CT antiserum (dilution 1/1000) for two hours and after four washes with PBS-T, 
incubated with peroxidase-conjugated secondary antibodies (dilution 1/8000) (Sigma) for two 
hours. Primary and secondary antibodies were diluted in PBS-T containing 1% BSA. After 
washing extensively with PBS-T, the enzymatic reaction was carried out at 37'*C with 100 i^l 

25 of substrate buffer (100 mM sodium acetate pH 6, 0.01 mg/ml TMB (3,3^5,5*- 
tetramethylbenzidine) and 0.01% hydrogen peroxide). The reaction was stopped after 10 min 
with 2N sulfuric acid and the optical density was measured at 450 nm using a Multiskan EX 
spectrophotometer (Labsystems). The antigen concentration in plant extracts was extrapolated 
from a standard curve obtained by using calcitonin-BSA and CT antiserum (dilution 1/1000). 

30 

Electron microscopy 
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Leaves from wild-type and transgenic plants were fixed by vacuum infiltration with 
1% glutaraldehyde and 2.5% paraformaldehyde in 20 mM phosphate buffer, pH 7.4 for one 
hour at room temperature. After washing with 20 mM phosphate buffer and 200 mM 
ammonium chloride successively, samples were dehydrated through ethanol series and 
embedded in Lowicryl K4M resin. Immunochemistry was performed essentially as described 
by Moore et al., 1991. Ultrathin sections were incubated with antisera against, KLH- 
calcitonin (1/500), aBiP (1/500) and y-zein (1/1500). Protein A-coiloidal gold (gold particles 
of 15 run) was used for antibody detection. As a control, parallel incubations were carried out 
on non-transgenic plant samples using identical dilutions of primary antibodies and on 
transgenic samples without primary antibody. Sections were stained with uranyl acetate and 
lead citrate and examined with a model 301 electron microscope (Phillips, Eindhoven, The 
Netherlans). 

Purification and enterokinase cleavage of RX3-CT and P4-CT fusion proteins 

Soluble extracts of RX3-CT and P4-CT were obtained from leaves of transgenic 
tobaco plants (Tl) in extraction buffer as described above. Solid (NH4)2S04 was 
progressively added at 0**C to RX3-CT and P4-CT soluble extracts to 45% and 60% saturation 
respectively. The samples were stirred for 30. min at 0*^0 and were then centrifuged at 15000 
rpm for 45 min at 4^*0. The precipitated proteins were resuspended in 20 mM Tris-HCl pH 8.6 
and desalted on PD 10 column (Sephadex G-25 M, Amersham Pharmacia). Desalted protein 
extracts were fractionated by Fast Performance Liquid Chromatography (FPLC) using an 
anion exchange column (HiTrap Q sepharose, Amersham Pharmacia) equilibrated with 20 
mM Tris-HCl pH 8.6, 100 mM DTT. Protein elution was carried out with a linear salt 
gradient from 0 to 200 mM NaCl in 20 mM Tris-HCl pH 8.6, 100 mM DTT. The presence of 
RX3-CT and P4-CT in eluted fractions was assessed by 15% SDS polyacryiamide gel 
electrophoresis and immunoblot detection using CT antiserum. Positive fractions were 
desalted and concentrated with 5 K NMWL centrifugal fihers (BIOMAX, Millipore), 
Quantification of RX3-CT and P4-CT fusion proteins was performed by ELISA. 

For EK digestion, 15 |ag of partially purified fiision proteins were incubated with 0.2 
U EK (EK Max, Invitrogen) in 30 |il of digestion buffer (50 mM Tris-HCl pH8, 1 mM NaCl, 
0.1% Tween-20) for 24 hours at 20*'C. EK digestion buffer was supplemented with 100 mM 
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DTT. The presence of the reducing agent allows to optimize enterokinase cleavage. Digestion 
products were analysed on 18% Tris-Tricine polyacrylamide gel electrophoresis and released 
pCT was detected by immunoblot. Synthetic salmon CT was used as positive control. 

5 Puriflcation and analysis of released pCT 

Plant calcitonin (pCT) released from fusion proteins by EK digestion was purified by 
RP-HPLC. Digestion mixture was applied to an analytical RP-C18 column (250 x 4 mm, 10 
).iM particule size, 120 A pore size) and the column was eluted using a gradient ranging from 
25 to 60% acetonitrile with 0,036% TFA in 20 min at a flow rate of 1 ml/min. The fractions 

10 collected were concentrated by lyophilization and stored at -20''C for pCT characterization. In 
a separate experiment, standard salmon CT was eluted under the same chromatographical 
conditions. TOF-MALDI mass spectrometry was used for pCT characterization. RP-HPLC 
fraction aliquots were mixed with equal volume of a matrix solution (10 mg/ml a-cyano-4- 
hydroxycinnamic acid and 0.1% TFA) and 1 |al of the mixure was deposited on the holder and 

15 analyzed with a Voyager-DE-RP mass spectrometer (Applied Biosystems). Standard salmon 
CT was always used in TOF-MALDI mass spectrometry experiments as a control. C-terminal 
analysis of the pCT was performed by incubating the purified peptide (20 pmoles/|il) for 60 
min at 3T'C with carboxypeptidase Y (0.1U/|al) and analysis of the digestion products by 
TOF-MALDI mass spectrometry. 

20 

II. RESULTS 

Construction of several derived y-zein-CT cliimeric genes 

The expression and successful assembly of y-zein proline rich domains into ER- 
25 derived protein bodies in plant leaves (Geli et al., 1994) provide a valuable tool to accumulate 
therapeutic proteins in the ER of plant tissues. y-Zein gene was deleted to create various 
proline-rich truncated proteins used as fusion partner to produce CT in tobacco plants. The 
chimeric genes comprised the y-zein domains and a CT synthetic gene linked by a linker 
conesponding to a protease cleavable site. The synthetic gene encoding the 32 amino acids 
30 active salmon calcitonin was generated from two complementary oligonucleotides (122 bases) 
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designed to use preferential plant codons in order to achieve high expression of the 
recombinant peptide in plants. The synthetic CT cDNA (Figure 2) included at 5' end a linker 
sequence corresponding to the enterokinase cleavage site ((Asp)4-Lys) and at 3' end an 
additional codon to produce a glycine. This glycine is a necessary substrate for the amidating 

5 enzyme (PAM) to generate the C-terminal prolinamide essential for CT biological activity. 
The calcitonin cDNA was fused to the sequences encoding the y-zein domains in a C-terminal 
fusion. For optimal expression of the derived y-zein-CT chimeric genes in plant systems, the 
plant transformation vectors contained the following regulatory sequences i) the constitutive 
enhanced 35S promoter and the 35S terminator from the cauliflower mosaic virus and ii) the 

10 translational enhancer from tomato etch virus (TL). The different fusion proteins generated 
are represented in Figure 5. The y-zein-CT fusion protein contains the whole y-zein fused to 
CT. The RX3-CT, R3-CT, P4-CT and XIO-CT fusion proteins contain the derived y-zein 
domains Hnked to CT in the same way as whole y-zein. These fusion proteins differ 
essentially in the presence or the abscence of the repeat and proX domains. 

Production of fusion proteins in tobacco plants 

All the fusion genes were used for stably tobacco plant transformation via 
Agrohacterium tumefaciens , At least twenty independant kanamycin-resistant plants (To) 
were regenerated for each fusion gene. The screening of the transgenic plants was performed 

20 by western blot analysis of soluble proteins extracts using a y-zein polyclonal antiserum. 
Transgenic lines immunoblot patterns representative of each fusion gene are shown in the 
Figure 6. As observed, recombinant fusion proteins were obtained in all transgenic lines with 
the exception of the XIO-CT fusion gene where no traces of fusion proteins were detected. 
This small fusion protein (80 amino acids) is probably unstable in tobacco plants. Two 

25 inimuno-labelled bands were detected in the R3-CT transgenic lines, one with an atjqjical 
high apparent molecular mass. This fusion protein was probably subjected to post-traductional 
modifications such as glycosylation. Indeed, it has been demonstrated that the y-zein proline 
rich repeat domain is able to be glycosilated when expressed in Arabidopsis plants (Alvarez et 
al., 1998). Protein expression level was quite variable between the different lines of a same 

30 fusion gene with the exception of the RX3-CT fusion gene which showed a high recombinant 
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protein expression level in all transgenic lines. An additional immunoblot screening was 
carried out using an antiserum specifically raised against the sCT peptide (Figure 7 A). As 
observed, the RX3-CT and the P4-CT proteins were strongly recognized by the sCT 
antiserum indicating that these fusions provide a better accumulation of the CT peptide in 

5 tobacco plants. It could be noted that RX3-CT and P4-CT immunoblot patterns displayed 
several labelled bands, the major band corresponding to the correct apparent molecular mass 
of the related recombinant protein. One hypothesis could be that the high molecular weight 
labelled bands were the result of an oligomerization process on y-zein domains which formed 
during the accumulation of the fusion proteins in plants tissues. In order to check the 

10 expression levels of fusion genes in relation to protein levels, a comparative northern blot 
analysis (Figure 7 B) was performed using the transgenic lines analysed by immunoblot in 
Figure 7 A. As shown, RX3-CT and P4-CT transcripts were the more abundant demonstrating 
a stable accumulation of these transcripts. Surprisingly, R3-CT transcripts were relatively 
abundant in comparison to the low R3-CT fusion protein level detected by immunoblot. 

15 Probably , the post-translational modification avoid the correct self-assembly of the fusion 
protein and subsequently its stability in the ER. 

The maximum expression level of RX3-CT and P4-CT proteins, measured by ELISA 
on leaf protein extracts from Tl plants, were respectively 12.44% and 10.65% of total soluble 
proteins whereas y-zein-CT and R3-CT expression level remained as lower as 0.01 % of total 

20 soluble proteins. With regard to these resuhs, RX3-CT and P4-CT transgenic lines were 
chosen for further experiments conducing to the production of plant calcitonin (pCT). 

Subcellular localization effusion proteins RX3-CT and P4-CT 

Expression of y-zein and two y-zein deletion mutans in Arabidopsis plants 
25 demonstrated that these proteins located within the ER of mesophyl cells forming ER-derived 
PBs (Geli et al., 1994). It was not evident, however, that the calcitonin fused to y-zein 
derivatives was sorted to similar organelles, the ER-PBs. To examine the subcellular 
localization in tobacco leaves of the y-zein flision proteins containing calcitonin, the inventors 
used immunoelectron microscopy (Figure 8). Ultra thin sections of transgenic tobacco leaves 
30 expressing RX3-CT and P4-CT proteins, were incubated with CT antibody and protein A- 
gold. A large PB-like organelles strongly labelled were observed in mesophyl cells of tobacco 
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expressing RX3-CT and P4-CT (Figure 8 A and B, respectively). Few vesicles were detected 
per cell and their size was quite heterogeneous. Since fusion proteins contained calcitonin 
protein and y-zein fragments, the ultrathin sections were also incubated with y-zein antibody. 
As was expected, the PBs were labeled with y-zein antibody confirming that the fusion 
5 proteins £iccumulated inside these organelles (Figure 8 C), To demonstrate that the PBs were 
fonned from the ER, the sections were incubated with an antibody against the ER resident 
protein, BiP (Figure 8 D). The concomitant occurrence of the CT-fusion proteins and BiP in 
these organelles indicated that RX3-CT and P4-CT accumulated within the ER lumen to fomi 
further independent ER-derived vesicles. Since the inventors were not able to detect PB-like 
10 organelles in ultrathin sections of non-transgenic plants (Figure 8 E), the control experiments 
were performed without primary antibody in transgenic plants (Figure 8 F). As expected no 
specific label was detected in control experiments. 

Purification effusion proteins and release of pCT 

15 RX3-CT and P4-CT fusion proteins were effectively extracted fi-om transgenic 

tobacco leaves (Tl) using an extraction buffer including a reducing agent such as DTT (200 
mM). About 85 |ag of RX3-CT and 73 |xg of P4-CT were recovered per gram of fresh 
material. RX3-CT and P4-CT proteins were concentrated respectively by 45% and 60% 
ammonium sulfate precipitation. The desalted protein extracts were fractionated by FPLC 

20 using an anion exchange chromatography and the recovered fusion proteins were quantified 
by ELISA. RX3-CT protein represented about 80% of total purified proteins whereas P4-CT 
was only about 50% of total purified proteins. Such difference could be explained by the fact 
that more proteins precipitate at 60% of ammonium sulfate than at 45% and that consequently 
the precipitated P4-CT proteins contained much more contaminant proteins. The partially 

25 purified fusion proteins RX3-CT and P4-CT were digested by EK and pCT release was 
controlled by a Tris-Tricine polyacrylamide gel electrophoresis and immunodetection. As 
shown in Figure 9, a single labelled band corresponding to calcitonin was generated from 
both RX3-CT and P4-CT protein cleavage. Small amounts of fusion proteins RX3-CT and 
P4-CT remained undigested probably due to the non accessibility of the enzyme to some 

30 cleavage sites. 
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Puriflcation and characterization of pCT 

Plant calcitonin (pCT) was isolated by fractionation of the EK digestion mixtures on 
an analytical CI 8 RP-HPLC column (Figure 10) and analysis of the eluted fractions by TOF- 
MALDI mass spectrometry using synthetic sCT as standard (MW 3433.24, Figure IL A). 
5 pCT calcitonin was eluted at 13 min (synthetic sCT Tr = 14 min) and gave a single spectrum 
with a mass of 3491.93 Da by TOF-MALDI mass spectrometry that is consistent with the 
tlieoretical molecular mass of the reduced C-terminal glycine extented calcitonin (Figure 1 1 
B). Mass spectrometry analysis of pCT subjected to carboxypeptidase Y digestion confirmed 
the integrity of the C-terminal glycine that is essential to produce the C-terminal prolinamide. 

10 

III. DISCUSSION 

A successful fusion protein-based system to accumulate salmon calcitonin in tobacco 
plants is presented. Two fusion proteins RX3-Cal and P4-Cal were found to strongly 
accumulate in ER-derived PBs of tobacco leaves. These fusion proteins contain the CT 

15 peptide and the proline rich domains of y-zein which consist in i) the repeat domain composed 
of eight units of the hexapeptide PPPVHL (only one unit in P4-Cal fusion protein) and ii) the 
proX domain where proline residues alternate with other amino acids. The y-zein proline rich 
domains are necessary for the correct retention and assembly of y-zein within Arabidopsis 
plants ER (Geli et al., 1994). The folding and the stabilization of the y-zein polypeptide chains 

20 in the ER have been attributed to the ability of the repeat and proX domains to self-assemble 
and to promote the formation of oligomers. The particular conformation adopted by these 
highly hydrophobic domains would be due to the proline rich sequences which are able to 
form an amphipathic secondary structure. As a result of its proper conformation, the proline 
rich domains would induce aggregation mechanisms involving protein-protein interactions 

25 and disulphide cross-links conducing to the ER retention and the formation of ER-derived 
PBs. This example shows that when expressed in a N-teiininal fusion manner the y-zein 
proline rich domains conserve the whole capacity to self-assemble and to promote the 
complex events which lead to the retention and the accumulation in the ER-derived PBs. The 
salmon CT involved in the fusion protein was also found to greatly accumulate in the PBs. 

30 The high expression level of CT in the transgenic tobacco plants can be attributed to the 
ability of the proline rich domains to fold and to stabilize the fusion protein. The deposition of 
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the fusion protein in the PBs certainly contribute to the enrichment of the plant tissues in CT 
by removing it from the hydroljtic intracellular environment. As small peptides are unstable 
in biological systems the fusion protein approach has been currently used to produce 
calcitonin in heterologous systems, for example in E, coli (Ray et al., 1993; Yabuta et al., 
5 1995; Hong et al., 2000), in Staphylococcus carnosus (Dilsen et al., 2000) and in the milk of 
transgenic rabbits (Mckee et al., 1998). In this last case, the fusion of CT with human alpha 
lactalbumin had also the purpose to mask the calcitonin activity to avoid a possible 
interference with the normal animal development. 

Inventors succeeded in rapid production of glycine extended sCT from tobacco plants: 

i) RX3-Cal and P4-Cal fusion proteins were efficiently recovered from tobacco 
tissues because of their high solubility in the presence of reducing agents, 

ii) enterokinase release of calcitonin from the fusion proteins was accomplished 
after one purification step of the fusion protein by an anion exchange 
chromatography, and 

iii) a reverse phase chromatography led to purified CT by removing it from EK 
digestion mixture. 

Mass spectrometry analysis of the released CT confirmed that correct glycine 
20 extended CT was produced by the tobacco plants. 
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CLAIMS 



1 . A nucleic acid sequence comprising: 

5 a first nucleic acid sequence containing the nucleotide sequence that encodes the 

protein y-zein, or a fragment thereof containing a nucleotide sequence that encodes an amino 
acid sequence capable of directing and retaining a protein towards the endoplasmic reticulum 
(ER) of a plant cell; 

10 a second nucleic acid sequence containing a nucleotide sequence that encodes an 

amino acid sequence that is specifically cleavable by enzymatic or chemical means; and 

a third nucleic acid sequence containing the nucleotide sequence that encodes a 
product of interest; 

15 

wherein the 3' end of said first nucleic acid sequence is linked to the 5' end of said 
second nucleic acid sequence and the 3' end of said second nucleic acid sequence is linked to 
the 5' end of said third nucleic acid sequence. 

20 2. Nucleic acid sequence according to claim 1, wherein said first nucleic acid sequence 

comprises a nucleotide sequence that encodes the full-length y-zein protein. 

3. Nucleic acid sequence according to claim 1, wherein said first nucleic acid sequence 
comprises: 

25 

one or more nucleotide sequences encoding all or part of the repetition domain of 
the protein y-zein; 

one or more nucleotide sequences encoding all or part of the ProX domain of tlie 
30 protein y-zein; or 



wo 2004/003207 



35 



PCT/EP2002/008716 



one or more nucleotide sequence encoding all or part of the repetition domain of 
the protein y-zein, and one or more nucleotide sequences encoding all or part of the 
ProX domain of the protein y-zein, 

5 4. Nucleic acid sequence according to claim 1, wherein said first nucleic acid sequence 

is chosen from the following group: 

the nucleotide sequence shown in SEQ ID NO: 1 [nucleotide sequence that 
encodes y-zein (Figure 1 A)], 

10 

- the nucleotide sequence shown in SEQ ID NO: 2 [nucleotide sequence identified 
as RX3 (Figure IB)], 

the nucleotide sequence shown in SEQ ID NO: 3 [nucleotide sequence identified 
15 asR3 (Figure IB)], 

the nucleotide sequence shown in SEQ ID NO: 4 [nucleotide sequence identified 
asP4 (Figure IC)], and 

20 - the nucleotide sequence shown in SEQ ID NO: 5 [nucleotide sequence identified 

as XIO (Figure IC)]. 

5. Nucleic acid sequence according to claims 1-4, wherein said second nucleic acid 
sequence comprises a nucleotide sequence that encodes an amino acid sequence that defines a 

25 protease cleavage site. 

6. Nucleic acid sequence according to claim 5, wherein said protease is an 
enterokinase, an Arg-C endoprotease, a Glu-C endoprotease, a Lys-C endoprotease or factor 
Xa. 

30 

7. Nucleic acid sequence according to claims 1-4, wherein said second nucleic acid 
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sequence comprises a nucleotide sequence that encodes an amino acid that is specifically 
cleavable by a chemical reagent. 

8. Nucleic acid sequence according to claim 7, wherein said chemical reagent is 
cyanogen bromide. 

9. Nucleic acid sequence according to claims 1-8, wherein said product of interest is a 
proteinaceous drug. 

10. Nucleic acid sequence according to claim 9, wherein said proteinaceous drug is a 
peptide hormone or an interferon, said drug being effective for treating the human or animal 
body. 

11. Nucleic acid sequence according to claim 10, wherein said peptide hormone is 
selected from calcitonin, erythropoietin, thrombopoietin and growth hormone. 

12. Nucleic acid sequence according to claim 1, wherein said third nucleic acid 
sequence comprises a nucleotide sequence encoding calcitonin and a codon for glycine at the 
3' end of said nucleic acid sequence encoding calcitonin. 

13. A fusion protein comprising: 

(i) the amino acid sequence of the protein y-zein, or a fragment thereof capable of 
directing and retaining a protein towards the ER of a plant cell, 

(ii) an amino acid sequence that is specifically cleavable by enzymatic or chemical 
means, and 



(iii) a product of interest; 
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said fusion protein being the expression product of the nucleic acid sequence of any of claims 
1 -12 in a host plant system. 

14. Fusion protein according to claim 13, comprising a full-length y-zein protein. 

5 

15. Fusion protein according to claim 14, comprising the amino acid sequence shown 
in Figure 1 A and identified in SEQ ID NO: 6. 

16. Fusion protein according to claim 13, comprising a fragment of a y-zein protein, 
10 said fragment containing an amino acid sequence capable of directing and retaining a protein 

towards the ER of a plant cell. 

17. Fusion protein according to claim 16, comprising a fragment of a y-zein protein 
selected from the group consisting of: 

15 

the amino acid sequence shown in SEQ ID NO: 7 [amino acid sequence 
corresponding to RX3 (Figure IB)], 

the amino acid sequence shown in SEQ ID NO: 8 [amino acid sequence 
20 corresponding to R3 (Figure IB)], 

the amino acid sequence shown in SEQ ID NO: 9 [amino acid sequence 
coiresponding to P4 (Figure IC)], and 

25 - the amino acid sequence shown in SEQ ID NO: 10 [amino acid sequence 

corresponding to XIO (Figure IC)]. 

18. Fusion protein according to claim 13, wherein said amino acid sequence that is 
specifically cleavable by enzymatic means comprises a protease cleavage site. 

30 

19. Fusion protein according to claim 13, wherein said amino acid sequence that is 
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specifically cleavable by chemical means comprises a cleavage site cleavable by a chemical 
reagent. 

20. Fusion protein according to claim 13, wherein said product of interest is a 
5 proteinaceous drug. 

21. A nucleic acid construct comprising (i) a nucleic acid sequence according to any of 
claims 1 to 12, and (ii) a regulatory nucleotide sequence that regulates the transcription of the 
nucleic acid of the invention (i), said regulatory sequence (ii) being functional in plants. 

10 

22. Construct according to claim 21, wherein said regulatory sequence (ii) is tissue- 
specific. 

23. Construct according to claim 21, wherein said regulatory sequence (ii) comprise a 
1 5 promoter functional in plants. 

24. Construct according to claim 22, wherein said regulatory sequence (ii) comprises 
the promoter 35SCaMV, the "patatina" promoter, a storage protein promoter, the ubiquitine 
gene promoter or the regulatory sequences of the y-zein gene. 

20 

25. Construct according to claim 21, wherein said regulatory sequence (ii) comprises a 
transcription termination sequence functional in plants. 

26. Construct according to claim 25, wherein said regulatory sequence (ii) comprises 
25 the temiinator 35SCaMV, the terminator of the octopine synthase (ocs) gene, the terminator 

of the nopaline synthase (nos) gene or the terminator of the y-zein gene. 

27. Construct according to claim 21, wherein said regulatory sequence (ii) further 
comprises a translation enhancer functional in plant. 

30 

28. Construct according to claim 27, wherein said translation enhancer functional in 
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plant comprises the promoting sequence for transcription of the tomato etcii virus, and tlie 
like. 

29. A vector comprising a nucleic acid sequence according to any of claims 1 to 12, or 
5 a nucleic acid construct according to any of claims 21 to 28. 

30. A transformed plant host system, said plant host system having been transformed 
with a nucleic acid sequence according to any of claims 1 to 12, or v^ith a nucleic acid 
construct according to any of claims 21 to 28, or v^ith a vector according to claim 29, 

10 

31. A transgenic plant host system, said transgenic plant host system comprising, 
integrated in its genome, a nucleic acid sequence according to any of claims 1 to 13. 

32. Plant host system according to claim 30 or 31, wherein said plant host system is a 
1 5 monocot or a dicot plant. 

33. Plant host system according to claim 32, wherein said plant host system is a cereal, 
a legume, a cruciferous or a solanaceous. 

20 34. Plant host system according to claim 30 or 31, comprising a seed. 

35. A method for producing a product of interest in a plant host system, which 
comprises growing a transformed or transgenic plant host system according to claims 30-34, 
under conditions that allow the production and expression of said product of interest in the 

25 form of a fusion protein. 

36. Method according to claim 35, which further comprises the isolation and 
purification of said fusion protein. 

30 37. Method according to claim 35 or 36, which further comprises the release of said 

product of interest from said fusion protein. 
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38. A method for producing calcitonin in a plant host system, comprising: 

a) transforming a plant host system with an expression vector or with a nucleic 
acid construct, comprising a regulatory sequence for the transcription of a 
nucleic acid sequence (nucleic acid sequence of the invention) that consists of: 

a first nucleic acid sequence containing the nucleotide sequence that 
encodes the protein y-zein, or a fragment thereof containing a nucleotide 
sequence that encodes an amino acid sequence capable of directing and 
retaining a protein towards the endoplasmic reticulum (ER) of a plant cell; 

a second nucleic acid sequence containing a nucleotide sequence that 
encodes an amino acid sequence that is specifically cleavable by 
enzymatic or chemical means; and 

a third nucleic acid sequence containing the nucleotide sequence that 
encodes calcitonin; 

wherein the 3' end of said first nucleic acid sequence is linked to the 5' end of 
said second nucleic acid sequence and the 3' end of said second nucleic acid 
sequence is linked to the 5' end of said third nucleic acid sequence; 

b) generating complete plants from said plant host systems transformed with said 
expression vector or nucleic acid construct; 

c) growing such transformed plants under conditions that allow the production 
and expression of calcitonin in the form of a fusion protein; and, if desired 

d) isolating, purifying said fusion protein and treating said fusion protein in order 
to release calcitonin. 
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1/13 



-zeln sequence 

atgagggtgttgctcgttgccctcgctctcctggctctcgctgcgagcgccacctccacg 
MRVLLVALALLALAASATS T 
catacaagcggcggctgcggctgccagccaccgccgccggttcatctaccgccgccggtg 
HTSGGCGCQPP P P V H P V 

catctgccacctccggttcacctgccacctccggtgcatctcccaccgccggtccacctg 
H L P P ^£ Li^ P P P V H L P P P V H L _ 

ccgccgccggtccacctgccaccgccggtccatgtgccgccgccggttcatctgccgccg 
P P P V H L P P P V H V P P P V H L P P 
ccaccatgccactaccctactcaaccgccccggcctcagcctcatccccagccacaccca 
P P CHYPTQPPRPQP Q,.,P,,.^^_P_ 
tgcccgtgccaacagccgcatccaagcccgtgccagctgcagggaacctgcggcgttggc 

QLQGTCGVG 
agcaccccgatcctgggccagtgcgtcgagtttctgaggcatcagtgcagcccgacggcg 
STPI LGQCVEFLRHQCSPTA 
acgccctactgctcgcctcagtgccagtcgttgcggcagcagtgttgccagcagctcagg 
TPYCSPQCQSLRQQCCQQLR 
caggtggagccgcagcaccggtaccaggcgatcttcggcttggtcctccagtccatcctg 
QVEPQHRYQAIFGLVLQSIL 
cagcagcagccgcaaagcggccaggtcgcggggctgttggcggcgcagatagcgcagcaa 
QQQPQSGQVAGLLAAQIAQQ 
ctgacggcgatgtgcggcctgcagcagccgactccatgcccctacgctgctgccggcggt 
LTAMCGLQQPTPCPYAAAGG 
gtcccccacgcc 
V P H A 

Signal peptide 

— ■ Repeat domain 
■■■■■ pro-X domain 
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2/13 

RX3 - sequence 

atgagggtgttgctcgttgccctcgctctcctggctctcgctgcgagcgccacctccacg 
MRVLLVALALLALAASATS T 
catacaagcggcggctgcggctgccagccaccgccgccggttcatctaccgccgccggtg 
HTSGGCGCQ P P P V H L P P P V 

catctgccacctccggttcacctgccacctccggtgcatctcccaccgccggtccacctg 
H L P P P V H L P P P V H L P P P V H L 
ccgccgccggtccacctgccaccgccggtccatgtgccgccgccggttcatctgccgccg 
P P P V H L P P P V H V P P P V H L P P 
ccaccatgccactaccctactcaaccgccccggcct cage ct cat ccccagccacaccca 
P P ,J^,.H^.,Y__ P___T_^^^^P^^^P^^ II ^^P^^^^^P^^^H^^ E^^ 
tgcccgtgccaacagccgcatccaagcccgtgccagacc 
CPCQQPHPSPCQT 



R3 - s equence 

atgagggtgttgctcgttgccctcgctctcctggctctcgctgcgagcgccacctccacg 
MRVLLVALALLALAASATS T 
cat acaagcggcggctgcggctgc cage caccgccgccggtt cat ctaccgccgccggtg 
HTSGGCGCQ PP PP VHLPPPV 
catctgccacctccggttcacctgccacctccggtgcatctcccaccgccggtccacctg 
H L P P P V H L P P P V H L P P P V H L 
ccgccgccggtccacctgccaccgccggtccatgtgccgccgccggttcatctgccgccg 
P P P V H P V H V P P P V H L P P 

ccaccatgccactaccctactcaaccgccccggacc 
P P CHYPTQPPRT 



Signal peptide 



■ Repeat domain 
'■ Pro-X domain 

FiGo 1 cB 
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P4 - s eq[uence 

atgagggtgttgctcgttgccctcgctctcctggctctcgctgcgagcgccacctccacg 

MRVLLVALALLALAASATS T 

cat acaagcggcggctgcggctgccagccaccgccgccggt teat ctgccgccgccacca 

HTSGGCGCQ PPPPVHLPPPP 

tgccactaccctacacaaccgccccggcctcagcctcatccccagccacacccatgcccg 

CHYPTQPPRPQPHPQPHPCP 

tgccaacagccgcatccaagcccgtgccagacc 

CQQPHPSPCQT 
■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■I 

XI 0 - sequence 

atgagggtgttgctcgttgccctcgctctcctggctctcgctgcgagcgccacctccacg 
MRVLLVALALLALAASATS T 
catacaagcggcggctgcggctgccaatgccactaccctactcaaccgccccggcctcag 

HTSGGCGCQ J^,,lS,,m^,m^mm.J.,m^mmF..m^mmJi.m?mmJimi 

cctcatccccagccacacccatgcccgtgccaacagccgcatccaagcccgtgccagacc 
PHPQPHPCPCQQPHPSPCQT 



Signal peptide 

— ■ Repeat domain 
■■■■■ Pro-X domain 



FSGo 1 oC 
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EK 



1 NT D D D D K 

2 5' GAC GAC GAC GAG AAG 

3 5' 



CT 



' C S N L S T C 

TGC TCC AAC CTC TCT ACC TGC 
TGC TCC AAT CTC TCT ACT TGC 



IVLGKLSQ 

2 GTT CTT GGT AAG CTC TCT CAG 

3 GTT CTG GGG AAG TTG AGT CAG 



E L H K L Q T 

GAG CTT CAC AAG CTC CAG ACT 
GAA TTA CAT AAG CTG CAA ACT 



lYPRTNTGSGTPG ter 

2 TAC CCT AGA ACC AAC ACT GGT TCC GGT ACA CCT GGT TGA 3 ' 

3 TAC CCG CGT ACC AAC ACT GGT TCT TCT ACC ACA 3 ' 
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FIG. 4 



wo 2004/003207 



PCT/EP2002/008716 




wo 2004/003207 



PCT/EP2002/008716 



8/13 




wo 2004/003207 



PCT/EP2002/008716 



9/13 




FIG. 7 
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FIG. 9 
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SEQUENCE LISTING 



<110> Advanced in vitro Cell Technologies, S.L. 

<120> Production of products of interest by accumulation in endoplasmic 
reticulum-derived protein bodies 

<160> 10 

<210> 1 
<211> 672 
<212> DNA 
<213> Zea mays 



■<400> 1 

atgagggt gt 

catacaagcg 

catctgccac 

ccgccgccgg 

ccaccatgcc 

tgcccgtgcc 

agcaccccga 

acgccctact 

caggtggagc 

cagcagcagc 

ctgacggcga 

gtcccccacg 



tgctcgttgc 
gcggctgcgg 
ctccggttca 
tccacctgcc 
actaccctac 
aacagccgca 
tcctgggcca 
gctcgcctca 
cgcagcaccg 
cgcaaagcgg 
tgtgcggcct 
cc 



cctcgctctc 
ctgccagcca 
cctgccacct 
accgccggtc 
tcaaccgccc 
tccaagcccg 
gtgcgtcgag 
gtgccagtcg 
gtaccaggcg 
ccaggtcgcg 
gcagcagccg 



ctggctctcg 
ccgccgccgg 
ccggtgcatc 
catgtgccgc 
cggcctcagc 
tgccagctgc 
tttctgaggc 
ttgcggcagc 
atcttcggct 
gggctgttgg 
actccatgcc 



ctgcgagcgc 
ttcatctacc 
tcccaccgcc 
cgccggttca 
ctcatcccca 
agggaacctg 
atcagtgcag 
agtgttgcca 
tggtcctcca 
cggcgcagat 
cctacgctgc 



cacctccacg 
gccgccggtg 
ggtccacctg 
tctgccgccg 
gccacaccca 
cggcgttggc 
cccgacggcg 
gcagctcagg 
gtccatcctg 
agcgcagcaa 
tgccggcggt 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
672 



<210> 2 
<211> 339 
<212> DNA 
<213> 



<400> 2 
atgagggt gt 
catacaagcg 

catctgccac 
.ccgccgccgg 
ccaccatgcc 
tgcccgtgcc 



tgctcgttgc cctcgctctc ctggctctcg ctgcgagcgc cacctccacg 60 

gcggctgcgg ctgccagcca ccgccgccgg ttcatctacc gccgccggtg 120 

ctccggttca cctgccacct ccggtgcatc tcccaccgcc ggtccacctg 180 

tccacctgcc accgccggtc catgtgccgc cgccggttca tctgccgccg 240 

actaccctac tcaaccgccc cggcctcagc ctcatcccca gccacaccca 300 

aacagccgca tccaagcccg tgccagacc 339 



<210> 3 
<211> 276 
<212> DNA 
<213> 



<400> 3 

atgagggtgt tgctcgttgc cctcgctctc ctggctctcg ctgcgagcgc cacctccacg 60 

catacaagcg gcggctgcgg ctgccagcca ccgccgccgg ttcatctacc gccgccggtg 120 

catctgccac ctccggttca cctgccacct ccggtgcatc tcccaccgcc ggtccacctg 180 

ccgccgccgg tccacctgcc accgccggtc catgtgccgc cgccggttca tctgccgccg 240 

ccaccatgcc actaccctac tcaaccgccc cggacc 276 



<210> 4 
<211> 213 
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<212> DNA 
<213> 

<400> 4 

atgagggtgt tgctcgttgc cctcgctctc ctggctctcg ctgcgagcgc cacctccacg 60 

catacaagcg gcggctgcgg ctgccagcca ccgccgccgg ttcatctgcc gccgccacca 120 

tgccactacc ctacacaacc gccccggcct cagcctcatc cccagccaca cccatgcccg 180 

tgccaacagc cgcatccaag cccgtgccag acc 213 

<210> 5 
<211> 180 
<212> DNA 
<213> 

<400> 5 

atgagggtgt tgctcgttgc cctcgctctc ctggctctcg ctgcgagcgc cacctccacg 60 

catacaagcg gcggctgcgg ctgccaatgc cactacccta ctcaaccgcc ccggcctcag 120 

cctcatcccc agccacaccc atgcccgtgc caacagccgc atccaagccc gtgccagacc 180 

<210> 6 
<211> 224 
<212> PRT 
<213> Zea mays 

<400> 6 

Met Arg Val Leu Leu Val Ala Leu Ala Leu Leu Ala Leu Ala Ala Ser 
15 10 15 

Ala Thr Ser Thr His Thr Ser Gly Gly Cys Gly Cys Gin Pro Pro Pro 

20 25 30 

Pro Val His Leu Pro Pro Pro Val His Leu Pro Pro Pro Val His Leu 
35 40 45 

Pro Pro Pro Val His Leu Pro Pro Pro Val His Leu Pro Pro Pro Val 
50 55 60 

His Leu Pro Pro Pro Val His Val Pro Pro Pro Val His Leu Pro Pro 
65 70 75 80 

Pro Pro Cys His Tyr Pro Thr Gin Pro Pro Arg Pro Gin Pro His Pro 

85 90 95 

Gin Pro His Pro Cys Pro Cys Gin Gin Pro His Pro Ser Pro Cys Gin 
100 105 110 

Leu Gin Gly Thr Cys Gly Val Gly Ser Thr Pro lie Leu Gly Gin Cys 
115 120 125 

Val Glu Phe Leu Arg His Gin Cys Ser Pro Thr Ala Thr Pro Tyr Cys 
130 135 140 

Ser Pro Gin Cys Gin Ser Leu Arg Gin Gin Cys Cys Gin Gin Leu Arg 
145 150 155 160 
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Gin Val Glu Pro Gin 
165 

Gin Ser lie Leu Gin 
180 

Leu Ala Ala Gin lie 
195 

Gin Pro Thr Pro Cys 
210 



His Arg Tyr Gin Ala lie 
170 

Gin Gin Pro Gin Ser Gly 
185 

Ala Gin Gin Leu Thr Ala 

200 

Pro Tyr Ala Ala Ala Gly 
215 



Phe Gly Leu Val Leu 
175 

Gin Val Ala Gly Leu 
190 

Met Cys Gly Leu Gin 
205 

Gly Val Pro His Ala 
220 



<210> 7 
<211> 113 
<212> PRT 



<400> 7 

Met Arg Val Leu 
1 

Ala Thr Ser Thr 
20 

Pro Val His Leu 
35 

.Pro Pro Pro Val 
50 

His Leu Pro Pro 
65 

Pro Pro Cys His 



Gin Pro His Pro 
100 



Leu Val Ala Leu 
5 

His Thr Ser Gly 



Pro Pro Pro Val 
40 

His Leu Pro Pro 

55 

Pro Val His Val 
70 

Tyr Pro Thr Gin 

85 

Cys Pro Cys Gin 



Ala Leu Leu Ala 
10 

Gly Cys Gly Cys 

25 ■ 

His Leu Pro Pro 



Pro Val His Leu 
60 

Pro Pro Pro Val 
75 

Pro Pro Arg Pro 
90 

Gin Pro His Pro 
105 



Leu Ala Ala Ser 
15 

Gin Pro Pro Pro 

30 

Pro Val His Leu 
45 

Pro Pro Pro Val 



His Leu Pro Pro 
80 

Gin Pro His Pro 
95 

Ser Pro Cys Gin 
110 



Tyr 



<210> 8 
<211> 92 
<212> PRT 



<400> 8 

Met Arg Val Leu 
1 

Ala Thr Ser Thr 
20 

Pro Val His Leu 
35 



Leu Val Ala Leu 
5 

His Thr Ser Gly 

Pro Pro Pro Val 
40 



Ala Leu Leu Ala 
10 

Gly Cys Gly Cys 
25 

His Leu Pro Pro 



Leu Ala Ala Ser 
15 

Gin Pro Pro Pro 
30 

Pro Val His Leu 
45 



Pro Pro Pro Val His Leu Pro Pro Pro Val His Leu Pro Pro Pro Val 
50 55 60 
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His Leu Pro Pro Pro Val His Val Pro Pro Pro Val His Leu Pro Pro 
65 70 75 80 

Pro Pro Cys His Tyr Pro Thr Gin Pro Pro Arg Tyr 
.85 90 

<210> 9 
<211> 71 
<212> PRT 

•<400> 9 

Met Arg Val Leu Leu Val Ala Leu Ala Leu Leu Ala Leu Ala Ala Ser 
15 10 15 

Ala Thr Ser Thr His Thr Ser Gly Gly Cys Gly Cys Gin Pro Pro Pro 
20 25 30 

Pro Val His Leu Pro Pro Pro Pro Cys His Tyr Pro Thr Gin Pro Pro 
35 40 45 

Arg Pro Gin Pro His Pro Gin Pro His Pro Cys Pro Cys Gin Gin Pro 
50 55 60 

His Pro Ser Pro Cys Gin Tyr 
65 70 

<210> 10 
<211> 60 
<212> PRT 

<400> 10 

Met Arg Val Leu Leu Val Ala Leu Ala Leu Leu Ala Leu Ala Ala Ser 
15 10 15 

Ala Thr Ser Thr His Thr Ser Gly Gly Cys Gly Cys Gin Cys His Tyr 
20 25 30 

Pro Thr Gin Pro Pro Arg Pro Gin Pro His Pro Gin Pro His Pro Cys 
35 40 45 



Pro Cys Gin Gin Pro His Pro Ser Pro Cys Gin Tyr 
50 55 60 
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